Genome‐wide association analysis of hyperspectral reflectance data to dissect the genetic architecture of growth‐related traits in maize under plant growth‐promoting bacteria inoculation

Abstract Plant growth‐promoting bacteria (PGPB) may be of use for increasing crop yield and plant resilience to biotic and abiotic stressors. Using hyperspectral reflectance data to assess growth‐related traits may shed light on the underlying genetics as such data can help assess biochemical and physiological traits. This study aimed to integrate hyperspectral reflectance data with genome‐wide association analyses to examine maize growth‐related traits under PGPB inoculation. A total of 360 inbred maize lines with 13,826 single nucleotide polymorphisms (SNPs) were evaluated with and without PGPB inoculation; 150 hyperspectral wavelength reflectances at 386–1021 nm and 131 hyperspectral indices were used in the analysis. Plant height, stalk diameter, and shoot dry mass were measured manually. Overall, hyperspectral signatures produced similar or higher genomic heritability estimates than those of manually measured phenotypes, and they were genetically correlated with manually measured phenotypes. Furthermore, several hyperspectral reflectance values and spectral indices were identified by genome‐wide association analysis as potential markers for growth‐related traits under PGPB inoculation. Eight SNPs were detected, which were commonly associated with manually measured and hyperspectral phenotypes. Different genomic regions were found for plant growth and hyperspectral phenotypes between with and without PGPB inoculation. Moreover, the hyperspectral phenotypes were associated with genes previously reported as candidates for nitrogen uptake efficiency, tolerance to abiotic stressors, and kernel size. In addition, a Shiny web application was developed to explore multiphenotype genome‐wide association results interactively. Taken together, our results demonstrate the usefulness of hyperspectral‐based phenotyping for studying maize growth‐related traits in response to PGPB inoculation.


| INTRODUCTION
Sustainably increasing food production is required considering growing demands, especially in developing nations (Laurance et al., 2014). Recent studies found that plant growth-promoting bacteria (PGPB) are a viable option for increasing plant resilience to biotic and abiotic stressors, with the potential to increase food production (Batista et al., 2018;Compant et al., 2005;Yassue et al., 2021). PGPB can promote morphological (Mantelin, 2003) and functional (Di Benedetto et al., 2017) changes in plants. Reported effects include increased uptakes of nutrients, such as nitrogen, phosphate, potassium, and iron (Egamberdiyeva, 2007;Pii et al., 2015), and activation of responses to pathogens and abiotic stressors (Olanrewaju et al., 2017;Singh et al., 2018).
One of the challenges associated with assessing the possible benefits of PGPB is identifying plant response. Hyperspectral image data can be used to assess biochemical or physiological attributes of plants; thus, such data have been increasingly applied in plant genetics and management studies because of their associations with target phenotypes, such as water content (550-1750 nm) (Ge et al., 2016), plant nutrient status (350-2500 nm) (Mahajan et al., 2014;Nigon et al., 2021), disease susceptibility (Thomas et al., 2017), yield (400-900 nm) , and biomass (380-850 nm) (Krause et al., 2019).
For example, hyperspectral patterns genetically correlated with target phenotypes can potentially aid genomic prediction (Krause et al., 2019;Sandhu et al., 2021). In addition, hyperspectral phenotypes can be used for genetic inference studies, such as genome-wide association (GWA), heritability, and genetic correlation analyses, to investigate associations between hyperspectral bands and genome (Barnaby et al., 2020;Feng et al., 2017;Sun et al., 2019;Wu et al., 2021).
Bayesian whole-genome regression models are useful for GWA studies because they implicitly account for population structure and the multiple-testing problem of classical single-marker linear mixed models by simultaneously fitting all markers (Fernando et al., 2017;Wolc & Dekkers, 2022). Despite the increasing use of highthroughput phenotyping through the compilation of hundreds or thousands of phenotypes, a limited number of whole-genome regression studies have integrated hyperspectral data into genetic inference research (Barnaby et al., 2020;Sun et al., 2019;Yoosefzadeh-Najafabadi et al., 2021). Moreover, how hyperspectral wavelength data are associated with PGPB responses in maize remains elusive because it is challenging to interpret changes in hyperspectral reflectance patterns with regard to plant biological processes. The objectives of this study were (1) to investigate whether hyperspectral reflectance values are under genomic control, (2) to determine whether variations in hyperspectral reflectance values are correlated with growth-related traits at the genomic level, (3) to identify specific genomic regions associated with wavelengths that can be used to study maize growth-related traits under PGPB inoculation, and (4) develop an open-source a Shiny web application to explore multiphenotype GWA results interactively. We employed Bayesian wholegenome GWA methods to identify possible candidate genes associated with growth-related traits and hyperspectral reflectance bands.

| Genomic data
The genomic data used in this study have been previously published (Fritsche-Neto et al., 2020;Yassue et al., 2021). In brief, the 360 inbred lines were genotyped using the genotyping-by-sequencing method, followed by a two-enzyme (PstI and MseI) protocol (Poland et al., 2012;Sim et al., 2012). Deoxyribonucleic acid was extracted from leaves using cetyltrimethylammonium bromide (Doyle & Doyle, 1987). Single nucleotide polymorphism (SNP) calling was performed using TASSEL 5.0 (Bradbury et al., 2007) with B73 (B73-RefGen_v4) as a reference genome. SNP markers were removed if the call rate was < 90%, nonbiallelic, or if the minor allele frequency was < 5%. Missing marker codes were imputed using Beagle 5.0 software (Browning et al., 2018). Markers with pairwise linkage disequilibrium > 0.99 were removed using the SNPRelate R package (Zheng et al., 2012). The average of linkage disequilibrium decay was 185 kbp considering a conservative cutoff of 0.10. In total, 13,826 SNPs were

| Hyperspectral imaging and processing
Leaves of B+ and BÀ plants were collected, and hyperspectral images were recorded using a benchtop Pika L. camera system (Resonon, Bozeman, MT, USA). The middle portion of the last completely expanded leaf was used as the region of interest for hyperspectral imaging. A dark room with an additional light supply was used to minimize light variation. Radiometric calibration was performed according to the manufacturer's instructions. For each plant, a hyperspectral cube image was produced, which contained 150 bands with wavelengths in the 386-1021 nm range. Image processing through the Spectral Python (Boggs, 2014) module was performed by applying a mask to remove the background from the image, and the mean reflectance of each pixel was used for further analysis. Hyperspectral imaging and processing details are described in . In addition, 131 hyperspectral indices (mathematical band combinations) were calculated based on the mean reflectance value for each wavelength using the R package hsdar (Lehnert et al., 2019). These hyperspectral indices have been reported to be associated with a variety of phenotypes, such as nutrient and chlorophyll content, pigments, photosynthesis, and water content (Gitelson et al., 2014;Ranjan et al., 2012;Zarco-Tejada et al., 2004. A summary of the hyperspectral indices and their correlations are presented in the supporting information Table S1 and Figure S1, respectively. In total, 281 hyperspectral phenotypes were used in the analysis. Of these, 150 were single-band reflectance and 131 were hyperspectral indices.

| Univariate BayesC
BayesC (Kizilkaya et al., 2010) was applied to estimate the markers effect and variance components using the following model.
where y kli is the vector of phenotypes (manually measured or hyperspectral phenotypes) for the kth replication, lth block nested within replication, and ith genotype; μ is the overall mean; r and b are the fixed effects for replication and block nested within replication, respectively; w ij is the incidence matrices of marker covariates for each SNP coded as 0, 1, or 2; and α j is the jth marker effect. The prior of α j was as follows: where σ 2 α is the common marker genetic variance, π is a mixture proportion set to 0.99, and ϵ is the vector of residuals. A Gaussian prior Nð0,σ 2 ϵ Þ was assigned to the vector of residuals, and a flat prior was assigned to μ, r, and b. The scaled inverse χ 2 distribution was assigned to σ 2 α and σ 2 ϵ with the degrees of freedom equal to 4 and choosing the scale parameter such that the prior mean of the variance equals half of the phenotypic variance. The variance components obtained from univariate BayesC were used to estimate genomic heritability , where h 2 g is the genomic heritability, σ 2 g and σ 2 e are the additive genomic and residual variances, respectively, and n r is the number of replication (2).

| Bivariate BayesC
A recent study showed that hyperspectral image data can be used to perform phenomic prediction for PH, SD, SDM with reasonable accuracy ). In the current study, we investigated if these hyperspectral reflectance values are under genetic control and whether they are associated with PH, SD, and SDM. Bivariate BayesC was used to investigate whether they are controlled by the same genomic regions by estimating the genetic correlation between manually measured and hyperspectral phenotypes. The model description follows that of univariate BayesC with some modification. Here, y is the vector of manually measured and hyperspectral phenotypes, and the marker effect of trait t for locus j followed The jth marker effect can be reparameterized as whether the jth marker effect for trait t is zero or nonzero and β j follows a multivariate normal distribution with null mean and covariance , where α 1 , α 1 , and α 12 (α 21 ) are marker genetic variance for trait 1, marker genetic variance for trait 2, and marker genetic covariance between traits 1 and 2, respectively, and the residuals were assumed independently and identically distributed multivariate normal vectors with null mean and covariance matrix Σ ϵ . The covariance matrices, Σ α and Σ ϵ , were assigned an inverse Wishart prior distribution with W À1 ðS α , ν α Þ and W À1 ðS ϵ ,ν ϵ Þ, respectively. We assumed all possible combinations for δ j , namely, (0,0), (0,1), (1,0), and (1,1) having nonzero probability.

| Bayesian GWA analysis
The aforementioned BayesC was used to perform GWA analyses of hyperspectral reflectance values and manually measured phenotypes.
Candidate markers were selected based on their posterior inclusion probabilities. The posterior inclusion probability indicates the probability of inclusion of a given marker in the model (Fernando & Garrick, 2013). According to a previous study (Fan et al., 2011), a posterior inclusion probability threshold of 0.10 was used for manually measured phenotypes, and 0.50 was used as a more conservative threshold for hyperspectral GWA. All Bayesian analyses were fit using The genomic correlation estimates between hyperspectral wavelengths and manually measured traits were largely positive ( Figure 2).

| GWA analyses for growth-related and hyperspectral traits
A total of 86 SNPs were selected from the BayesC analysis using the posterior inclusion probability threshold of 0.10 for PH, SD, and SDM (Figure 3 and supporting information Tables S2-S6). This number was higher than a previous study ) that used a non-Bayesian whole-genome regression model likely because BayesC has a higher statistical power to detect genetic signals. There were eleven common SNPs between the current study and Yassue et al. (2022) (

| Integration of GWA analyses
Eight SNPs influenced both the manually measured and hyperspectral phenotypes, which were visualized in a phenome-wide association plot

| Estimates of genomic heritability and correlation
The hyperspectral phenotypes showed a similar range of genomic heritability estimates relative to that of manually measured phenotypes.
For most hyperspectral-derived phenotypes, the heritability estimates  Table S1.
T A B L E 2 List of candidate genes influencing both manually measured phenotypes and hyperspectral phenotypes. varied from 0.30 to 0.50, indicating that hyperspectral data can capture genetic variation. In addition, the genetic correlation between the manually measured and hyperspectral phenotypes indicated that the same sets of genes probably influenced these responses.
The relatively higher genomic correlation estimates for PH in the spectral range of 400-700 nm (visible spectrum) of BÀ plants may indicate an association between PH and leaf pigments, such as carotenoids, chlorophyll a and b, and nitrogen concentrations (Ayala-Silva & Beyl, 2005;Zhao et al., 2003). PH and other early growth traits are positively associated with the final dry matter yield and hyperspectral reflectance values (Capolupo et al., 2015;Freeman et al., 2007;Prey et al., 2018;Strigens et al., 2012;Williams et al., 2017;Xie et al., 2020) because more vigorous plants tend to be higher. Our previous work demonstrated that SDM, SD, and PH could be predicted with reasonable accuracy using hyperspectral image data . The ability of the hyperspectral phenotypes to predict growth-related phenotypes is likely due to some shared genetic association between them. It is known that hyperspectral phenotypes are related to other types of phenotypes, such as photosynthesis, nutrient uptake, and morphological traits (Krause et al., 2019;Ge et al., 2016;Mahajan et al., 2014;Nigon et al., 2021;Sandhu et al., 2021;Thomas et al., 2017) that were not evaluated in this study but are correlated with plant growth and vigor.
Similarly, higher genomic correlation estimates were observed for SDM at 700-1000 nm (near-infrared). The association between nearinfrared spectra and plant biomass in maize was reported in maize in a previous study (Ma et al., 2020). Wavelengths in this range have also been reported to indicate nitrogen content in rapeseed (Müller et al., 2008) and wheat (Hansen & Schjoerring, 2003).

| GWA analysis
Overall, more SNP associations were observed in B+ than in BÀ plants. However, the genomic heritability estimates were similar between the two managements. This suggests that the genetics underlying hyperspectral responses differ between the two managements. GWA analysis showed that SNP Chr4_245633076 was associated with nine hyperspectral phenotypes pointing to three candidate genes (nrt2, nrt2.2, and Zm00001d054060). These genes have been previously reported as part of NO À 3 transporter gene families and as candidates for nitrate uptake along the primary maize root (Liu et al., 2009;Sorgonà et al., 2011;Wang et al., 2020). The genes Zm00001d029820 and Zm00001d012924 are involved in plant development and environmental stress conditions Zhu et al., 2021). The candidate gene Zm00001d007843 may affect kernel size , and gene Zm00001d012719 is a candidate transcription factor mediating plant responses to abiotic stressors (Vendramin et al., 2020). We did not directly evaluate the phenotypes related to previously reported candidate genes. Nevertheless, hyperspectral signatures may be of use for indirectly assessing these phenotypes. Furthermore, eight SNPs were detected in both the manually measured and hyperspectral phenotypes. Genes Zm00001d052164 and Zm00001d052165 on chromosome 4 regulate nitrogen assimilation , and Zm00001d006916 is a candidate gene responsible for autophagy, which may play a role regarding responses to abiotic stressors . The LOC100216557 gene is associated with resilience of maize to aphids and may be responsible for plant defense responses and stress tolerance (Srivastava et al., 2018), and 103647869 is a candidate gene for resistance to Aspergillus flavus infection or aflatoxin contamination . The gene 100274563 is associated with ear weight per plant (Zhou et al., 2020). The results found in this study must be interpreted with caution because of the low density of SNPs and the relatively small number of lines used.
The Shiny web application with an interactive interface is a powerful tool for the visualization and interpretation of GWA analysis of multiple phenotypes. Two-way Manhattan plots can be used to investigate all associations across traits and managements, including nonsignificant results that are not elaborated on here. In addition, phenome-wide association plots can be used to identify and visualize markers with mutual influence across hundreds or thousands of phenotypes. The Shiny application can be easily extended to other high-throughput phenotyping data, such as longitudinal, fluorescence, and thermal data.

| CONCLUSIONS
The hyperspectral signatures captured some genetic variability in the maize diversity panel and were associated with growth-related traits under PGPB inoculation. GWA analysis of hyperspectral data identified genomic regions that influenced both manually measured phenotypes and hyperspectral bands. In addition, a Shiny web application for multiple-phenotype GWA was developed. and writing-review and editing.